51 research outputs found

    Snomed CT in a Language Isolate: an Algorithm for a Semiautomatic Translation

    Get PDF
    Background:: The Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT) is officially released in English and Spanish. In the Basque Autonomous Community two languages, Spanish and Basque, are official. The first attempt to semi-automatically translate the SNOMED CT terminology content to Basque, a less resourced language is presented in this paper. Methods:: A translation algorithm that has its basis in Natural Language Processing methods has been designed and partially implemented. The algorithm comprises four phases from which the first two have been implemented and quantitatively evaluated. Results:: Results are promising as we obtained the equivalents in Basque of 21.41% of the disorder terms of the English SNOMED CT release. As the methods developed are focused on that hierarchy, the results in other hierarchies are lower (12.57% for body structure descriptions, 8.80% for findings and 3% for procedures). Conclusions:: We are in the way to reach two of our objectives when translating SNOMED CT to Basque: to use our language to access rich multilingual resources and to strengthen the use of the Basque language in the biomedical area.This work was partially supported by the European Commission (325099), the Spanish Ministry of Science and Innovation (TIN2012-38584-C06-02) and the Basque Government (IT344-10 and IE12-333). Olatz Perez-de-Viñaspre's work is funded by a PhD grant from the Basque Government (BFI-2011-389)

    Biomedical abbreviation recognition and resolution by PROSA-MED

    Get PDF
    The amount of abbreviations used in biomedical literature increases constantly. Despite the existence of acronym dictionaries, it is not viable to keep them updated with new creations. Thus, in the processing of biomedical texts, discovering and disambiguating acronyms and their expanded forms are essential aspects and this is the objective proposed by BARR task at IberEval 2017 Workshop. This paper presents our participation in this task. We propose five systems that deal with the problem in different ways. Three of the systems are atomic approaches, while two of them are combinations of the atomic systems. One of the systems clearly outperforms the others, both in the detection of entities (F-score of 0.749 in the test set) as well as identifying relations between short-long forms (F-score of 0.697 in the test set).Peer ReviewedPostprint (published version

    Cause of Death estimation from Verbal Autopsies: Is the Open Response redundant or synergistic?

    Get PDF
    Civil registration and vital statistics systems capture birth and death events to compile vital statistics and to provide legal rights to citizens. Vital statistics are a key factor in promoting public health policies and the health of the population. Medical certification of cause of death is the preferred source of cause of death information. However, two thirds of all deaths worldwide are not captured in routine mortality information systems and their cause of death is unknown. Verbal autopsy is an interim solution for estimating the cause of death distribution at the population level in the absence of medical certification. A Verbal Autopsy (VA) consists of an interview with the relative or the caregiver of the deceased. The VA includes both Closed Questions (CQs) with structured answer options, and an Open Response (OR) consisting of a free narrative of the events expressed in natural language and without any pre-determined structure. There are a number of automated systems to analyze the CQs to obtain cause specific mortality fractions with limited performance. We hypothesize that the incorporation of the text provided by the OR might convey relevant information to discern the CoD. The experimental layout compares existing Computer Coding Verbal Autopsy methods such as Tariff 2.0 with other approaches well suited to the processing of structured inputs as is the case of the CQs. Next, alternative approaches based on language models are employed to analyze the OR. Finally, we propose a new method with a bi-modal input that combines the CQs and the OR. Empirical results corroborated that the CoD prediction capability of the Tariff 2.0 algorithm is outperformed by our method taking into account the valuable information conveyed by the OR. As an added value, with this work we made available the software to enable the reproducibility of the results attained with a version implemented in R to make the comparison with Tariff 2.0 evident.This work was partially funded by the Spanish Ministry of Science and Innovation (DOTT-HEALTH/PAT-MED PID2019-106942RB-C31); by both Antidote PCI2020-120717-2 and Lotu TED2021-130398B-C22 funded by the MCIN/AEI / 10.13039/501100011033 and by the European Union NextGenerationEU/ PRTR; by the Basque Government (IXA IT-1570-22); and by Misiones Euskampus 2.0 (EXTEPA)

    A Cascaded Syntactic Analyser for Basque

    Get PDF
    This article presents a robust syntactic analyser for Basque and the different modules it contains. Each module is structured in different analysis layers for which each layer takes the information provided by the previous layer as its input; thus creating a gradually deeper syntactic analysis in cascade. This analysis is carried out using the Constraint Grammar (CG) formalism. Moreover, the article describes the standardisation process of the parsing formats using XML

    Aportaciones de las técnicas de aprendizaje automático a la clasificación de partes de alta hospitalarios reales en castellano

    Get PDF
    Hospitals attached to the Spanish Ministry of Health are currently using the International Classification of Diseases 9 Clinical Modification (ICD9-CM) to classify health discharge records. Nowadays, this work is manually done by experts. This paper tackles the automatic classification of real Discharge Records in Spanish following the ICD9-CM standard. The challenge is that the Discharge Records are written in spontaneous language. We explore several machine learning techniques to deal with the classification problem. Random Forest resulted in the most competitive one, achieving an F-measure of 0.876.La red de hospitales que configuran el sistema español de sanidad utiliza la Clasificación Internacional de Enfermedades Modificación Clínica (ICD9-CM) para codificar partes de alta hospitalaria. Hoy en día, este trabajo lo realizan a mano los expertos. Este artículo aborda la problemática de clasificar automáticamente partes reales de alta hospitalaria escritos en español teniendo en cuenta el estándar ICD9-CM. El desafío radica en que los partes hospitalarios están escritos con lenguaje espontáneo. Hemos experimentado con varios sistemas de aprendizaje automático para solventar este problema de clasificación. El algoritmo Random Forest es el más competitivo de los probados, obtiene un F-measure de 0.876.This work was partially supported by the European Commission (SEP-210087649), the Spanish Ministry of Science and Innovation (TIN2012-38584-C06-02) and the Industry of the Basque Government (IT344-10)

    IRAKAZI: a web-based system to assess the learning process of Basque language learners

    Get PDF
    IRAKAZI is a teacher oriented web-based system designed for the study of Basque learners' learning process. The system involves a wide background in NLP tools, error detection and ICALL environments and it is easily transferable to other languages

    Primeros pasos en la anotación tanto manual como automática de informes clínicos en Español

    Get PDF
    Este artículo presenta la anotación de un corpus de informes clínicos de pacientes de ginecología y obstetricia, así como el desarrollo de un esquema de anotación para su etiquetado manual. Centramos nuestra descripción en el etiquetado manual de los informes y en la adaptación de la herramienta para el Procesamiento del Lenguaje Natural Freeling al dominio médico.This paper presents the annotation of a corpus of gynaecology and obstetrics patient records and the development of an annotation scheme for its hand tagging. We focus our description in the manual annotation of the clinical notes and in the adaptation of the Natural Language Processing analyzer Freeling to the medical domain

    Erroreak automatikoki detektatzeko tekniken azterlana eta euskararentzako aplikazioak

    Get PDF
    In this article, we study the techniques used for detecting errors in Natural Language Processing (NLP). We classify the techniques according to their approach (symbolic or empirical), and then we describe them in depth. Following that, we describe the systems we have developed for detecting syntactic errors in Basque, by using that technique as a criterion for the classification of those systems, and enhancing it with examples

    Proyecto de transferencia tecnológica Deteami: tecnologías de procesamiento del lenguaje natural para la ayuda en farmacia y en farmacovigilancia

    Get PDF
    The goal of the Deteami project is to develop tools that make clinicians aware of adverse drug reactions stated in electronic health records of the clinical digital history. The records produced in hospitals are a valuable though nearly unexplored source of information among others due to the fact that are tough to get due to privacy and confidentiality restrictions. To leverage the clinicians work of reading and analyzing the health records looking for information about the health of the patients, in this project we explore the records automatically, identify among others disorder and drug entities, and infer medical information, in this case, adverse drug reactions. In this project a research-framework was settled with the Galdakao-Usansolo and Basurto Hospitals from Osakidetza (the Basque Health System). Osakidetza provided both the texts and the final user feedback, as well as, specialists that annotate the corpora, an in this way, we obtained a gold-standard.El objetivo del proyecto Deteami es el desarrollo de herramientas para ayudar al personal clínico a identificar reacciones adversas a medicamentos en informes médicos electrónicos de la historia clínica digital. Los informes que se generan en los hospitales son una valiosa fuente de información aún no debidamente explotada debido principalmente a restricciones de privacidad y confidencialidad. Con el objetivo de aliviar el trabajo del personal clínico que se dedica a leer y analizar los informes médicos buscando información sobre la salud de los pacientes, en este proyecto analizamos automáticamente los informes, identificamos entre otras entidades que describen enfermedades y medicamentos, y finalmente, inferimos información médica; en este caso, reacciones adversas a medicamentos. En este proyecto hemos establecido un marco de colaboración con los hospitales de Galdakao-Usansolo y Basurto pertenecientes a Osakidetza (Servicio Vasco de Salud). Osakidetza participa mediante la provisión de los textos y retroalimentando el trabajo técnico con su experiencia, así como expertos que anotan el corpus para la obtención del gold-standard.This work was partially supported by the Spanish Ministry of Science and Innovation (EXTRECM: TIN2013-46616-C2-1-R, TADEEP: TIN2015-70214-P) and the Basque Government (DETEAMI: Ministry of Health 2014111003, IXA Research Group of type A (2010-2015), ELKAROLA: KK-2015/00098)

    Agrupaciones para la extracción de entidades clínicas

    Get PDF
    Health records are a valuable source of clinical knowledge and Natural Language Processing techniques have previously been applied to the text in health records for a number of applications. Often, a first step in clinical text processing is clinical entity recognition; identifying, for example, drugs, disorders, and body parts in clinical text. However, most of this work has focused on records in English. Therefore, this work aims to improve clinical entity recognition for languages other than English by comparing the same methods on two different languages, specifically by employing ensemble methods. Models were created for Spanish and Swedish health records using SVM, Perceptron, and CRF and four different feature sets, including unsupervised features. Finally, the models were combined in ensembles. Weighted voting was applied according to the models individual F-scores. In conclusion, the ensembles improved the overall performance for Spanish and the precision for Swedish.Los informes médicos son una valiosa fuente de conocimiento clínico. Las técnicas de Procesamiento del Lenguaje Natural han sido aplicadas al procesamiento de informes médicos para diversas aplicaciones. Generalmente un primer paso es la detección de entidades médicas: identificar medicamentos, enfermedades y partes del cuerpo. Sin embargo, la mayoría de los trabajos se han desarrollado para informes en Inglés. El objetivo de este trabajo es mejorar el reconocimiento de entidades médicas para otras lenguas diferentes a Inglés, comparando los mismos métodos en dos lenguas y utilizando agrupaciones de modelos. Los modelos han sido creados para informes médicos en Español y Sueco utilizando SVM, Perceptron, CRF y cuatro conjuntos diferentes de atributos, incluyendo atributos no supervisados. Para el modelo combinado se ha aplicado votación ponderada teniendo en cuenta la F-measure individual. En conclusión, el modelo combinado mejora el rendimiento general y para posibles mejoras debemos investigar métodos más sofisticados de agrupación.This work has been partially funded by the Spanish ministry (PROSAMED: TIN2016-77820-C3-1-R, TADEEP: TIN2015-70214-P), the Basque Government (DETEAMI: 2014111003), the University of the Basque Country UPV-EHU (MOV17/14) and the Nordic Center of Excellence in Health-Related e-Sciences (NIASC)
    corecore